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The observations of gravitational-wave signals from astrophysical sources such as binary inspirals 
will be used to test general relativity for self consistency and against alternative theories of gravity. I 
describe a simple formula that can be used to characterize the prospects of such tests, by estimating 
the matched-filtering signal-to-noise ratio required to detect non-general-relativistic corrections of a 
given magnitude. The formula is valid for sufficiently strong signals; it requires the computation of 
a single number (the fitting factor between the general-relativistic and corrected waveform families); 
and it can be applied to all tests that embed general-relativity in a larger theory, including tests 
of individual theories such as Brans-Dicke gravity, as well as the phenomenological schemes that 
introduce corrections and extra terms in the post-Newtonian phasing expressions of inspiral wave- 
forms. Using the formula, I show on very general grounds that the volume-limited gravitational-wave 
searches performed with second-generation ground-based detectors would detect alternative-gravity 
corrections to general-relativistic waveforms as small as 1-10% (i.e., fitting factors of 0.9 to 0.99). 



I. INTRODUCTION AND MAIN RESULTS 

The possibility of performing high-precision tests of 
general relativity (GR) in its dynamical, strong-gravity 
regime [1] is perhaps the most exciting prospect of the 
budding held of gravitational- wave (GW) astronomy [2J. 
Several authors have carried out detailed analyses of such 
tests for both ground-based and space-based GW detec- 
tors [3ti25] , and by large the tests proposed so far belong 
in two classes. 

In the first, GR is tested against well-dehned alterna- 
tive theories, such as scalar-tensor or massive-graviton 
theories, which recover GR for particular value of one 
or more additional parameters, such as the Brans-Dicke 
coupling constant, or the graviton mass [5rtl8|. Thus, 
the strength of the tests is characterized by the accu- 
racy with which the alternative-theory parameters can 
be measured and either found to be consistent with GR, 
or to deviate from it. 

In the second class of tests, GR is tested for self- 
consistency by treating some of the coefficients in the 
post-Newtonian (PN) expansion of the phasing as free 
variables rather than deterministic functions of the 
source parameters, and verifying whether the recovered 
values are consistent with GR predictions [19 -22 . The 
strength of these tests is characterized by the amplitude 
of the deviations from GR that could be discerned in 
the PN coefficients. More general tests are possible with 
the parametrized post-Einstein (ppE) formalism [221 US] , 
which, in addition to modifying the PN coefficients, adds 
extra terms to the PN amplitude and phasing and to the 
merger and ringdown waveforms, and recovers individual 
alternative theories for specific forms of the extra terms. 

As advocated in [21 125], GR-by-GW tests find a 
more satisfying formulation in Bayesian model selection 
[271 128] . which compares the Bayesian evidence, given 
the observed data s, for the alternative-theory/modificd- 
GR scenario (henceforth "AG," for "alternative gravity" ) 
and for the Einstein-GR hypothesis. Model selection was 
applied to the PN consistency tests in Refs. pMl |2"91 150] . 



and to ppE inspiral waveforms in [25] . (For a comprehen- 
sive discussion of model selection in the context of GW 
detection, rather than GR tests, see also Refs. [5Trt34| .) 
To wit, in model selection we compute the Bayesian odds 
ratio 

P(AG|s) _ P(AG) J p{s\9 l - a )p(8 i > a )d9 i ' a 
~ P(GR|s) ~ P {GR) J p(s\9 i )p(9 i )d9 i ' () 

where P(AG) and P(GR) = 1 - P(AG) are the prior 
probabilities assigned to the AG and GR hypotheses; 9 l 
and 9 a are the source parameters (masses, spins, etc.) 
and additional AG parameters, respectively; p{s\9) is the 
likelihood of the observed data s given 6; and p(9) is the 
prior probability distribution for #{^] The odds ratio de- 
scribes the degree to which we should prefer one hypoth- 
esis over the other after having observed the data, and it 
incorporates the Bayesian law of parsimony (a.k.a. Oc- 
cam's razor) — although models with additional parame- 
ters will always fit the data better, they will be relatively 
disfavored by the improbability that more parameters as- 
sume particular values in their prior ranges [23 [25] • 

A cogent way of understanding the statistical signifi- 
cance of odds ratios is to set up a decision scheme based 
on the value of OR [2Q1 EI]- Namely, we declare that 
we have detected AG whenever O is greater than a set 
threshold Othr- We set Othr by requiring a given false- 
alarm rate F: this is the fraction of observations in which 
the underlying signal is GR, but O happens to pass the 
threshold. F gets smaller the more averse we are to 
falsely claiming AG detection, and its choice in practice 
should be guided by the prior P(AG). Now, for a given 
Othr, the efficiency E of detection is the fraction of ob- 
servations in which the underlying signal is AG, and O 
passes the threshold, so AG is detected correctlyr] A way 



1 In this paper we forgo annotating probabilities with the custom- 
ary conditional dependence on "all other" assumptions, usually 
denoted as /. 

2 The performance of decision schemes is characterized by their 
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of understanding the strength of a test of GR is then to 
choose a reasonably low F (say, 10 -4 ) and ask how strong 
an AG effect and how loud a GW signal we would need 
to detect AG with reasonably high E (say, 1/2, but it 
turns out in practice that E rises sharply after that). 

In Ref. [25], Cornish and colleagues point out that the 
odds ratio for AG over GR grows with the signal-to-noise 
ratio (henceforth, SNR) of the residual obtained after 
the best-fit GR waveform has been subtracted from the 
data; thus, alternative models that are not fit well by 
varying the GR parameters can be detected more easily 
than models that are. Indeed, Cornish and colleagues 
show that in the limit of large signal SNR and small 
AG deviations the logarithm of the odds ratio scales as 
(1 - FF) SNR 2 , with FF the fitting factor [M] between 
the GR and AG waveforms: 



FF(0 



AG J 



max 



(^gr(^gr)i ^ag(^ag)) 
I^gr^grJII^ag^ag)!' 



(2) 



Here /igr(^Gr) and /iag(^ag) are the GR and AG wave- 
form families (so GR = & l and f5 A c = # i,a ), and (•,•) 
is the standard noise- weighted inner product, such that 
the sampling probability of a Gaussian-noise realization 
ri is cx e~(™'™)/ 2 , and the optimal matched-filtering SNR 
of an observed signal h is its norm \h\ = (h, h) 1 / 2 (see, 
e.g., [37J). 111 the FF, the parameters #ag are fixed by 
the AG waveform contained in the data, and the inner 
product is maximized over #gr- The FF is by definition 
independent of SNR, and it tends to one when the AG 
corrections vanish or can be completely reabsorbed by 
varying 9 GR . 

In this paper I formalize and generalize this scaling 
statement by deriving the full expression of the odds ra- 
tio for the AG and GR hypotheses, in the limit of large 
SNR; the result is valid when AG embeds GR, which 
is the case for all classes of tests discussed abovc|^] (see 
Sec.|Tl]). Moreover, I derive the decision-scheme statistics 
for the resulting OR, and show that the efficiency E(F) 
is a remarkably simple function [Eq. (19 1, a combina- 
tion of the error function and its inverse | of th e effective 



III). No other 



signal-to-noise ratio SNRa/i"— FF (see Sec. 
information about the waveforms is needed. 

Thus, AG detection by model comparison allows us to 
characterize very generally both kinds of tests discussed 
above, by computing the SNR required to positively detect 
an AG correction as a function of its FF. Given the sen- 
sitivity curve of the detector and the projected detection 



receiver operating characteristic E(F) |35] . Note that the term 
"fraction" , used above in defining F and E, is ideally the fraction 
of an infinite number of observations of the same GW signal 
immersed in different realizations of noise. This characterization 
of decision schemes is therefore a frequentist statement (about 
the Bayesian statistic 0), but one that this Bayesian author finds 
very reasonable. 

3 I thank to Curt Cutler for pointing out that this is true also for 
the PN-cocfficicnt tests. 



10,000 



1,000 



100 



source SNR 
required for ^ ^ 
AG detection 




-5 -4 -3 
log, (l - FF) 



FIG. 1. SNR required for AG detection with efficiency E = 
1/2, with false-alarm probability F = 10 -4 and 10 -8 , as a 
function of FF. The right-side vertical axis shows the number 
of events required in a volume-limited search with detection 
threshold of 8 to yield a loudest event with the (median) SNR 
on the left-side vertical axis. 



rates for a source class, we can then derive the magni- 
tude of the AG corrections that we expect to be able to 
constrain in our observation campaigns. The FF can be 
computed from the GR Fisher matrix using the formu- 
las of Ref .[38] , or directly by maximizing the normalized 
product pi over #gr- 

The AG-detection SNR is shown in Fig. [I] for F = 
10 -8 -10 -4 , and it is a rather exacting function of 
1 — FF. For the typical observations at the event- 
detection threshold (SNR ~ 8) produced by volume- 
limited searches, only 10% AG corrections (1 — FF = 0.1) 
would be detectable. The required SNR grows roughly 
threefold for each decade of 1 - FF, to SNR > 30 for 
1% effects, SNR > 100 for one-in-a-thousand effects, and 
SNR > 1, 000 for one-in-ten-thousand effects. 

We can also compute easily the total volume-limited 
detection rates that would yield one event strong enough 
(on the median) to detect AG corrections with a given 1 — 
FF (see Sec. Ill); these are shown on the right-side verti- 
cal axis of Fig. [1] Comparison with the expected binary- 
inspiral detection rates for second-generation ground- 
based detectors [3J5] suggests that precise tests of GR 
would have to wait for the much higher rates afforded by 
third-generation detectors [40]. Even pooling together 
the evidence from all observed events |41j may not help 
much, reducing the number of required detection by a 
factor of a few, because the evidence is dominated by 



the few loudest sources (see again Sec. III). By contrast, 



space-based observatories such as the LISA concept [H] 
(or its latest incarnation, the European-led eLISA [415] ) 
are not volume-limited for some source classes, and would 
see some events with large SNRs. 

The rest of this paper is organized as follows: in Sec. 
|lll I derive the odds ratio in the two cases where the 
underlying signal is AG and GR; in Sec. |III[ I s tudy the 
statistics of the AG decision scheme; in Sec. |IV[ I discuss 
the significance and applications of these results. 
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II. AG-GR ODDS RATIO IN THE HIGH-SNR 
LIMIT 



In the following, we let 9 l be the m-dimensional vec- 
tor of GR parameters, and 9^ = (9 l ,9 a ) the vector 
of AG parameters, which augments ff 1 with the single 
AG parameter 9 a ; the derivation can be extended eas- 
ily to more AG parameters. We write the true signal as 
^AG(#truc) = + Ah, with ho a GR signal, and Ah the 
AG correction, which we assume proportional to 6*" ruo . 

In a sufficiently small neighborhood of 6*f ruo , the signal 

Ae^hf,, with 



can be expanded as ^agO 
A6 )m = 6/m _ Q» ue anc i /j m 



) = ho + Ah 

dh/d9^, evaluated at ho- If 



the SNR is sufficiently large, this approximation is valid 
throughout the region of parameter space that supports 
most of the likelihood [U] . 

We can now compute the value P(AG|sag) of the ev- 
idence for the AG hypothesis when the data contain an 
AG signal, sag = ^AG^truc) + n - The likelihood can be 
written as 

p{ SAG \A6»)=Ne-\ s ^- h ^i 2 =Ne-\ n -^ h »\ 2 l\ 

(3) 

and it is maximized by A9^ AL = (G 1 ) A " y (n, h u ), with 
G>„ = (h^,h u ) the (m + l)-dimcnsional AG Fisher ma- 
trix. Switching to parameters 59^ = A9^ L — A9 I ^ [L that 
describe displacements around the maximum, we resum 
the exponential as 



p(sagI^) ^J^e^ lnl2/2+{G ' lrvin -' h ^ {n ' h " )/2 ^ G ^ se ^ sev/2 . 



(4) 



The evidence follows by integrating out the 69^, which we do under the assumptions of flat priors p{9^) = l/A#p riol . 
in the relevant region of parameter space: 



P(AG| SAG ) = P(AG) / p(9») P ( SAG \69») = P(AG) 



(27r) ( m +l)/2^3^ 
1 Lfi prior 



j^ e -\n\ 2 /2+(G- 1 r"(n,h l <)(7iM v )/2_ ^ 



This expression can be understood as the product of the maximum likelihood (the normalized exponential) with the 
prior P(AG) and the Bayesian Occam factor (the fraction), which weighs (by volume) the region of uncertainty for 
the AG parameters after the observation with the region allowed by their priors. In the high-SNR limit, the posterior 
region of uncertainty is just the Fisher 1-ct ellipsoid, which has volume proportional to y^G -1 !- The second term in 
the exponential is the enhancement of likelihood due to overfitting noise: this is a random variable (a function of the 
noise realization) with expectation valu<Q equal to m + 1. 

We repeat this computation for the GR hypothesis, expanding the signal as /igr(^) = + A9 l hi, with A9 l = 
9 l —9l ruc , and integrating over S9 l = A9 l — (P~ 1 ) lJ (n + Ah, hj), with the m-dimensional GR Fisher matrix. From 
the point of view of GR waveforms, Ah behaves as an additional noise component. Thus 



P(GR|s A g) = P(AG) ( 27 ^ m/ Vl f *l ^r c -\n+Ah\ 2 /2+(F- 1 YHn+AhM)(n+Ah,h 1 )/2 ^ 



(6) 



prior 



where F t j = (hi,hj) is the m-dimensional Fisher matrix. 

We can now form the odds ratio Gag = P(AG|sag)/P(GR|sag), using the shorthand = (X, h^): 



Oag 



p(AG) (27r) 1 /y|G-i|/|F- 
p(GR) 



A9 AG 

prior 



^[\Ah\ 2 -(F- 1 ) ij AhiAh j ]/2+[(Ah,n)-(F- 1 y^Ah i nj]+[(G^ 



(7) 



this expression can be simplified considerably by noting that (P 1 y : >hi(hj, •) acts as the linear projector Pgr onto 
the local tangent space of signal derivatives taken with respect to GR parameters, so 



\Ah\ 2 - (F-^AKAhj = |(1 - P GR )A/f , 
(Ah,n) - AMj = ((1 - Pgr) Aft, n); 



(8) 



thus it is only the component Ah± = (1 — Pgr) Ah of the AG correction that enters the odds ratio; this is indeed the 
residual that cannot be reabsorbed by shifting the estimated values of the GR parameters, and the largest the Ah±, 
the more evidence there is for the AG hypothesis. 



4 Prom the definition of inner product as (a, b) = it foUows ; n general that = ( 0j6 ). Thon 

4ReJ a*(f)b(f)/S n (f)df and the definition of noise spec- ((G~ 1 ) f "'(n, h u )(n, h u )) = (G~ 1 )^ v G V n = Ift =m + l. 

tral density S n from (n*(/)n(/')> = S n (f)5(f - f')/2, U ' » 
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The Occam factor and noise-overfitting contributions to the maximum likelihood also bear some simplification: 
using the block-matrix decomposition of G M „ and its inverse, 



Giir = | Fij k } ^ {G ^ r = ( (F- 1 )^ + (F-^hkiF- 1 ) 1 * /k -(F-i) ik b k /k 



by C 



l/k 



(9) 



r 



where b { = {h u h a ), c = (h a , h a ), and k = c-bibj(F 
we can show that 



ls i lJ , next section we use them to characterize the statistics of 
our decision scheme. 



\Ga\ = \cFij 



bibj] 



\Fij\k, 



(G-^n^ - (F-^n^ = (Ah ± , n) 2 /\Ah^\ 2 , 

(10) 



so 



Oag 



= p(AG) (27T) 1 / a Aga rt r |A^|V2+x|Afc x |+ a V2 

p(GR) 



A6 a ■ 

prior 



where x = (A/ij_,n)/|A/ij_| is a normal random variable 
with zero mean and unit variance (see again footnote 
[4]), and A(9° st = fc -1 / 2 is the estimation error for the 
AG parameter, as given by the corresponding diagonal 
element of the inverse Fisher matrix G _1 . Remarkably (if 
logically), the odds ratio turns out to be a function of the 
posterior uncertainty and prior range for the additional 
AG parameter alone. 

We can link Ah± to the fitting factor FF by finding 
the AO 1 that maximizes the normalized match 



FF = max 

A0< \h -\ 



h AMq + AflVtj) 
Ah\ ■ {ho + AOthil 



(12) 



which is given (unsurprisingly) by AO 1 



(F 1 ) iJ (A/i, hj), and replacing it in Eq. (12), yielding 



1 \Ah ± \ 2 

1 - FF = -- — 

2 /io 2 



1 \Ah ± \ 



2 SNR 2 ' 



(13) 



which is valid to 0(SNR~ 4 ). Thus, for fixed FF the odds 
ratio scales as SNR 2 , just as it does in the Bayesian de- 
cision scheme for the (non) detection of a known signal in 
noise; for fixed SNR the odds ratio scales as 1— FF, so the 
odds ratio is larger with stronger and less reabsorbable 
AG deviations. The effects of detector noise add some 
statistical fluctuations through the random variable x. 

This derivation can be repeated with small changes to 
yield the odds ratio when the data contain a GR signal, 
s gr = ^GR.(^ruc) + witn ^Gr(^uc) = ^0, leading to 



Ogr 



p(AG) (27^/2 A0° 



p(GR) 



A9 a ■ 

prior 



est x /2 



(14) 



where again x is a normal random variable with zero 
mean and unity variance. Equations (11), (13), and (14) 



comprise the main novel result of this paper, and in the 



III. AG-GR DECISION SCHEME 

The distribution of CqR: as implied by the distribution 
of x through Eq. (14), determines the background of false 



AG detections for a chosen threshold Othr, quantified by 
the false-alarm probability F = P(Oqr > Othr)- We 
choose O t hr to yield the desired F, and evaluate the cor- 
responding efficiency E = P(0 ag > Othr) from Eq. ( [Tl] ). 
Surprisingly, because the ratios of priors P(AG)/P(GR) 
and the Occam factors are the same in Ogr and Oag, 
their only effect is to rescale O tri r, and they cancel out 
when we compute £ as a function of F. We can then 
work with the rcnormalized odds ratios 



^gr — e i 

n l _ „x 2 /2+v / 2xSNR AG +SNR? G 
^AG — c ) 



(15) 



where SNR AG = SNR^l - FF plays the role of an effec- 
tive SNR for AG detection. 

This is not to say that the priors P(AG) and P(GR) 
are unimportant. Indeed, our prior degree of belief in AG 
sets our requirements for F 31 . From basic Bayesian 
reasoning, the probability that AG is true when it is "de- 
tected" the odds-ratio decision scheme is 



P(AG|detected) 



E x P(AG) 



E x P(AG) 

P P(GR) \ 
+ PP(AG) ) 



F x P(GR) 

(16) 



since GR is so well tested, it seems reasonable that 
P(AG) < P(GR); then F must be < P(AG) if we are 
to believe that we have truly detected AG, because a 
false alarm is a priori much more probable than a true 
detection. 



Combining Eq. ( 14 ) with the definition of F and the 
sampling distribution p(x) — e~ x / 2 /-\/27r, we obtain 



F = erfc JlogO; 



'thr 



(17) 



with erfc(z) = 1 — erf(,z) the complementary error 
function, defined from the error function^] erf(z) = 
(2/y / 7r) J* e~* dt. Likewise, combining Eq. ( [Tl] ) with the 
definition of E and p{x), we find 



5 With this definition, the c.d.f. of a normal variable x with zero 
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E=\ (erf (-SNR AG + sfigOQ - erf (-SNR AG - yJlogO[ hl 



(18) 



Next, we solve Eq. (17) for 0' th and replace it in Eq. (18 1 



E = 1 - ~ fcrf (-SNR AG + erfc _1 (F)) - erf (-SNR AG - erfcT^F) 



(19) 



where z = erfc 1 (P) is the solution of erfc(z) = P. 
Solving S(SNRag) = 1/2 yields the SNR AG required 
for confident AG detection as a function of F, rang- 
ing from 2.75 to 4.05 for F = 1CT 4 down to 1(T 8 . 
The GW-detection SNR required for AG detection is 
just SNRagOFVVI -FF, and it is plotted in Fig. [I] for 
F = 10~ 4 and 10 -8 . We already discussed the meaning 
of these curves in Sec. [J 

An interesting question to ask is what detection rates 
would be needed in a volume-limited search so that the 
loudest observed signal could be used to detect AG cor- 
rections of given FF. In such a search, neglecting cos- 
mological effects for simplicity, source distances are dis- 
tributed as p(D) — 3/D hor (D/D hm ) 2 , out to the horizon 
distance Z?h or where sources are detected at the threshold 
SNRthr- For N GW detections, the minimum distance 
is distributee^ as p{D min ) = 3N/D hoI (D/D hoI f(l - 
(D/£> h or) 3 ) , which has median D hor (l - 2- 1 / Ar ) 1 / 3 . 
If follows that the median maximum SNR is SNRthr (1 — 
2 -i/iV)-i/3_ Settin g this equal to SNR A g {F) j\J 1 — FF 
and solving for TV, we obtain the required number of de- 
tections, which scales as (1 — FF) -3 / 2 , and is shown in 
Fig. [l]on the right-side vertical axis for SNR trir = 8. 

Figuring out what happens if we pool together the ev- 
idence from a number of observed events |¥T] of the 
same kind is a little harder computationally. The odds 
ratio take forms similar to the one-signal case: 



W GR — e ' 



0' 



AG 



oT.i i'/2+V2E, Xi SNR A G,.+SNRl Gi 



(20) 



where the Xi are independently distributed normal vari- 
ables with zero mean and unit variance, and the SNRag,i 
are the effective AG-detection SNRs for the individual 
observations. Here I limit myself to a small Monte Carlo 
exploration: assuming for simplicity that the FF is the 
same for all the sources, and taking the median over 



mean and unit norm is cdi(x) = 1/2(1 + erf(a;/v2))- 
Why? Consider first the minimum x m i n among TV variables 
independently and uniformly distributed in [0,1]. Its distribu- 
tion is p(a; m i n ) = TV(1 — x m i n ) N ~ 1 , since we could pick any 
of the TV as the minimum, and then its probability of being 
in [x m i n ,x m i n + dx] is just dx times the probability that the 
other TV — 1 are in [a; m j n , 1]. The minimum y m j n among TV vari- 
ables with distribution p(j/ m in) follows from the transformation 
x = cdf(y), from which p(y min ) = p(x mia )^ | WmIn . 



all {SNRag,j} realizations in a volume-limited search 
with SNRthr = 8, I find that with F = 10~ 4 we need 
~ 9/200/4, 500 observations to detect AG with 1 - FF = 
10- 2 /10- 3 /10- 4 , to be compared with - 28/900/30,000 
using evidence from the loudest source alone. Essentially, 
because SNRs are distributed as 1/SNR 4 , the Bayesian- 
inference problem is dominated by a few very loud events, 
and there are not very many of those for moderate de- 
tection rates. (However, this conclusion differs from the 
findings of Ref. [33], and it would be interesting to un- 
derstand why.) 



IV. DISCUSSION 

In this paper I have shown that, under the assump- 
tions of strong signals and Gaussian detector noise, the 
prospects for detecting alternative-gravity corrections to 
general relativity can be characterized very simply by 
computing a single number, the fitting factor between 
the GR and AG waveform families, and then obtain- 
ing the source SNR (shown in Fig. [T]) required for the 
alternative-gravity hypothesis to be favored in a decision 
scheme based on the Bayesian odds ratio. 

This happens because the FF is an SNR-independent 
measure of the strength of the AG corrections Ah± that 
cannot be reabsorbed by changing the GR source pa- 
rameters from their true values. The GR parameters 
are not known a priori, but must be determined from 
the same observation, so such "reabsorbable" AG effects 
cannot be detected positively, and they would result in 
a "fundamental bias" on the GR parameters if AG 
is true, but post-detection parameter estimation is per- 
formed with GR model templates. In Ref. [5S], Cornish 
and colleagues call such errors "stealth bias" if they are 
comparable to or larger than the noise-induced statisti- 
cal errors in the GR parameters, and yet AG cannot be 
detected positively. In the terms of this paper, stealth 
bias corresponds to FF very close to one and AG-induced 
errors (F -1 )^ (Ah, hj) that are large compared to the 
Fisher-matrix statistical errors \J (F -1 )". 

My formalism can also be applied to other context^] 
where we need to decide between a simpler model and one 



7 I thank Ilya Mandcl for pointing this out and providing these 
examples. 
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with additional parameters: for instance, binary inspi- 
rals of nonspinning vs. spinning compact objects, orbit- 
aligned vs. precessing spins, or point-like vs. extended- 
object dynamics. 

My formulas cannot predict what happens when the 
high-SNR, linearized-parameter approximation is not 
warranted; in that case full-fledged Monte Carlo integra- 
tion [Ml HH] would be required for accurate predic- 
tions. Whether SNRs are high enough can be determined 
using the test described in Sec. VI of Ref. [44] . I note 
however that it is for the strongest signals that GR-by- 
GW tests become most interesting, and that the results 
given above would persist as the leading-order contribu- 
tions to the evidence (see again [44], Sec. VII). 

Beyond the statistical characterization of the tests, we 
should always ask ourselves what it is that we could really 
detect, and whether we should really believe a positive 
AG detection if we get it. These are very hard questions, 
but the results of this paper suggest some qualitative 
considerations. 

First, it seems evident that a test based on matching 
AG corrections of a certain functional form Ah would 
only be sensitive to non-GR effects that have nonzero pro- 
jection along Ah. (For instance, AG waveforms with ad- 
ditional phasing parameters would not be sensitive to am- 
plitude corrections.) Now, both the consistency checks 
based on altering PN coefficients and the parametrized 
post-Einstein framework consider rather general correc- 
tions, so it may be hard to imagine that the waveform 
imprint from any reasonable AG theory would be fully 
orthogonal to them. Indeed, Ref. [53] argues that for 
quasicircular binary inspirals, the well-posedness of the 
initial-value problem restricts possible phasing terms to 
frequency powers /™^ 3 (where n can be negative), which 
could be covered in ppE scheme. However, if the pro- 
jection is small, the resulting 1 — FF would be strongly 
reduced, and the test would be sensitive only to much 
larger effects. 

Second, any positive detection of an AG correction 
Ah could also be explained as one of many systematic 



waveform corrections [45] that have nonzero projection 
along Ah, such as the effects of detector calibration and 
non-Gaussian detector noise, of standard-GR physics not 
included in the waveforms (spins, eccentricity, higher- 
PN terms), and of astrophysical perturbations (accretion 
disks, three-body systems). All of these effects should be 
considered a priori more likely then a modification of the 
extensively well tested GR, so they must be controlled by 
including them explicitly in the GR model, or at least by 
establishing that they are sufficiently orthogonal to AG 
corrections. On the plus side, instrumental systemat- 
ics would be different for the same signal as observed in 
multiple detectors, and GR-theoretical and astrophysical 
systematics would be different for multiple signals from 
similar sources, which would help discriminating AG cor- 
rections [46] . Nevertheless, preliminary claims of sensi- 
tivity to specific AG corrections may be overoptimistic, 
because Ah could be largely reabsorbed by systematic 
effects that are initially neglected. 

Testing GR with GWs remains one of the exciting fron- 
tiers of GW astronomy, but appropriate caution is needed 
to provide the proper context for current and future in- 
vestigations, and to allocate research effort wisely as we 
move toward the GW detection era. Computing some 
FFs will help. 
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